AITopics | large-scale vision-language model

Collaborating Authors

large-scale vision-language model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stable and low-precision training for large-scale vision-language models

Neural Information Processing SystemsDec-24-2025, 04:51:55 GMT

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models.

large-scale vision-language model, name change, stable and low-precision training, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

Stable and low-precision training for large-scale vision-language models

Neural Information Processing SystemsOct-10-2024, 09:41:43 GMT

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. Our main focus is int8 as GPU support for float8 is rare, though we also analyze float8 training through simulation. While SwitchBack proves effective for float8, we show that standard techniques are also successful if the network is trained and initialized so that large feature magnitudes are discouraged, which we accomplish via layer-scale initialized with zeros. As a result, we recommend an AdamW-Adafactor hybrid which avoids loss spikes when training a CLIP ViT-Huge model and outperforms gradient clipping at the scales we test.

large-scale vision-language model, loss spike, stable and low-precision training

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.74)

Add feedback

MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

He, Sunan, Nie, Yuxiang, Chen, Zhixuan, Cai, Zhiyuan, Wang, Hongmei, Yang, Shu, Chen, Hao

arXiv.org Artificial IntelligenceApr-23-2024

The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to construct vision-language datasets. Based on the constructed dataset, we developed MedDr, a generalist foundation model for healthcare capable of handling diverse medical data modalities, including radiology, pathology, dermatology, retinography, and endoscopy. Moreover, during inference, we propose a simple but effective retrieval-augmented medical diagnosis strategy, which enhances the model's generalization ability. Extensive experiments on visual question answering, medical report generation, and medical image diagnosis demonstrate the superiority of our method.

arxiv preprint arxiv, dataset, diagnosis, (13 more...)

arXiv.org Artificial Intelligence

2404.15127

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Hong Kong (0.04)
North America > United States (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Daily Assistive View Control Learning of Low-Cost Low-Rigidity Robot via Large-Scale Vision-Language Model

Kawaharazuka, Kento, Kanazawa, Naoaki, Obinata, Yoshiki, Okada, Kei, Inaba, Masayuki

arXiv.org Artificial IntelligenceDec-12-2023

In this study, we develop a simple daily assistive robot that controls its own vision according to linguistic instructions. The robot performs several daily tasks such as recording a user's face, hands, or screen, and remotely capturing images of desired locations. To construct such a robot, we combine a pre-trained large-scale vision-language model with a low-cost low-rigidity robot arm. The correlation between the robot's physical and visual information is learned probabilistically using a neural network, and changes in the probability distribution based on changes in time and environment are considered by parametric bias, which is a learnable network input variable. We demonstrate the effectiveness of this learning method by open-vocabulary view control experiments with an actual robot arm, MyCobot.

experiment, linguistic instruction, robot, (13 more...)

arXiv.org Artificial Intelligence

2312.07451

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Genre: Research Report (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.55)

Add feedback